Connecticut
AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters
Reducing the model redundancy is an important task to deploy complex deep learning models to resource-limited or time-sensitive devices. Directly regularizing or modifying weight values makes pruning procedure less robust and sensitive to the choice of hyperparameters, and it also requires prior knowledge to tune different hyperparameters for different models. To build a better generalized and easy-to-use pruning method, we propose AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights. The instability and noise during training on auxiliary parameters will not directly affect weight values, which makes pruning process more robust to noise and less sensitive to hyperparameters. Moreover, we design gradient update rules for auxiliary parameters to keep them consistent with pruning tasks. Our method can automatically eliminate network redundancy with recoverability, relieving the complicated prior knowledge required to design thresholding functions, and reducing the time for trial and error. We evaluate our method with LeNet and VGGlike on MNIST and CIFAR-10 datasets, and with AlexNet, ResNet and MobileNet on ImageNet to establish the scalability of our work. Results show that our model achieves state-of-the-art sparsity, e.g.
UAV3D: A Large-scale 3D Perception Benchmark for Unmanned Aerial Vehicles
Unmanned Aerial Vehicles (UAVs), equipped with cameras, are employed in numerous applications, including aerial photography, surveillance, and agriculture. In these applications, robust object detection and tracking are essential for the effective deployment of UAVs. However, existing benchmarks for UAV applications are mainly designed for traditional 2D perception tasks, restricting the development of real-world applications that require a 3D understanding of the environment. Furthermore, despite recent advancements in single-UAV perception, limited views of a single UAV platform significantly constrain its perception capabilities over long distances or in occluded areas. To address these challenges, we introduce UAV3D - a benchmark designed to advance research in both 3D and collaborative 3D perception tasks with UAVs. UAV3D comprises 1,000 scenes, each of which has 20 frames with fully annotated 3D bounding boxes on vehicles. We provide the benchmark for four 3D perception tasks: single-UAV 3D object detection, single-UAV object tracking, collaborative-UAV 3D object detection, and collaborative-UAV object tracking.
FedGMark: Certifiably Robust Watermarking for Federated Graph Learning Yuxin Yang 1,2 Qiang Li1 Yuan Hong 3 Binghui Wang
Federated graph learning (FedGL) is an emerging learning paradigm to collaboratively train graph data from various clients. However, during the development and deployment of FedGL models, they are susceptible to illegal copying and model theft. Backdoor-based watermarking is a well-known method for mitigating these attacks, as it offers ownership verification to the model owner. We take the first step to protect the ownership of FedGL models via backdoor-based watermarking. Existing techniques have challenges in achieving the goal: 1) they either cannot be directly applied or yield unsatisfactory performance; 2) they are vulnerable to watermark removal attacks; and 3) they lack of formal guarantees. To address all the challenges, we propose FedGMark, the first certified robust backdoor-based watermarking for FedGL. FedGMark leverages the unique graph structure and client information in FedGL to learn customized and diverse watermarks. It also designs a novel GL architecture that facilitates defending against both the empirical and theoretically worst-case watermark removal attacks.
Wearable ring translates sign language into text
American Sign Language (ASL) has long enabled real-time conversations for English-speaking people who are deaf and hard-of-hearing. But discussions often face significant lags when one or more conversants aren't fluent in the language system. But by combining deep learning artificial intelligence and micro-sonar technologies, researchers at Cornell University are developing a new wearable to help overcome the communication barriers. With further refinement, SpellRing may one day facilitate entire conversations regardless of your ASL comprehension skills. ASL's earliest iterations developed in the early 18th century at the American School for the Deaf in Hartford, Connecticut.
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Zhang, Siwei, Xiong, Yun, Tang, Yateng, Chen, Xi, Jia, Zian, Gu, Zehao, Xu, Jiarong, Zhang, Jiawei
Temporal graph neural networks (TGNNs) have shown remarkable performance in temporal graph modeling. However, real-world temporal graphs often possess rich textual information, giving rise to temporal text-attributed graphs (TTAGs). Such combination of dynamic text semantics and evolving graph structures introduces heightened complexity. Existing TGNNs embed texts statically and rely heavily on encoding mechanisms that biasedly prioritize structural information, overlooking the temporal evolution of text semantics and the essential interplay between semantics and structures for synergistic reinforcement. To tackle these issues, we present \textbf{{Cross}}, a novel framework that seamlessly extends existing TGNNs for TTAG modeling. The key idea is to employ the advanced large language models (LLMs) to extract the dynamic semantics in text space and then generate expressive representations unifying both semantics and structures. Specifically, we propose a Temporal Semantics Extractor in the {Cross} framework, which empowers the LLM to offer the temporal semantic understanding of node's evolving contexts of textual neighborhoods, facilitating semantic dynamics. Subsequently, we introduce the Semantic-structural Co-encoder, which collaborates with the above Extractor for synthesizing illuminating representations by jointly considering both semantic and structural information while encouraging their mutual reinforcement. Extensive experimental results on four public datasets and one practical industrial dataset demonstrate {Cross}'s significant effectiveness and robustness.
Cross-platform Prediction of Depression Treatment Outcome Using Location Sensory Data on Smartphones
Sahoo, Soumyashree, Shende, Chinmaey, Hossain, Md. Zakir, Patel, Parit, Niu, Yushuo, Wang, Xinyu, Ware, Shweta, Bi, Jinbo, Kamath, Jayesh, Russel, Alexander, Song, Dongjin, Yang, Qian, Wang, Bing
Currently, depression treatment relies on closely monitoring patients response to treatment and adjusting the treatment as needed. Using self-reported or physician-administrated questionnaires to monitor treatment response is, however, burdensome, costly and suffers from recall bias. In this paper, we explore using location sensory data collected passively on smartphones to predict treatment outcome. To address heterogeneous data collection on Android and iOS phones, the two predominant smartphone platforms, we explore using domain adaptation techniques to map their data to a common feature space, and then use the data jointly to train machine learning models. Our results show that this domain adaptation approach can lead to significantly better prediction than that with no domain adaptation. In addition, our results show that using location features and baseline self-reported questionnaire score can lead to F1 score up to 0.67, comparable to that obtained using periodic self-reported questionnaires, indicating that using location data is a promising direction for predicting depression treatment outcome.
Bayesian Optimization for Robust Identification of Ornstein-Uhlenbeck Model
Xu, Jinwen, Lu, Qin, Bar-Shalom, Yaakov
This paper deals with the identification of the stochastic Ornstein-Uhlenbeck (OU) process error model, which is characterized by an inverse time constant, and the unknown variances of the process and observation noises. Although the availability of the explicit expression of the log-likelihood function allows one to obtain the maximum likelihood estimator (MLE), this entails evaluating the nontrivial gradient and also often struggles with local optima. To address these limitations, we put forth a sample-efficient global optimization approach based on the Bayesian optimization (BO) framework, which relies on a Gaussian process (GP) surrogate model for the objective function that effectively balances exploration and exploitation to select the query points. Specifically, each evaluation of the objective is implemented efficiently through the Kalman filter (KF) recursion. Comprehensive experiments on various parameter settings and sampling intervals corroborate that BO-based estimator consistently outperforms MLE implemented by the steady-state KF approximation and the expectation-maximization algorithm (whose derivation is a side contribution) in terms of root mean-square error (RMSE) and statistical consistency, confirming the effectiveness and robustness of the BO for identification of the stochastic OU process. Notably, the RMSE values produced by the BO-based estimator are smaller than the classical Cram\'{e}r-Rao lower bound, especially for the inverse time constant, estimating which has been a long-standing challenge. This seemingly counterintuitive result can be explained by the data-driven prior for the learning parameters indirectly injected by BO through the GP prior over the objective function.
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Ning, Kanghui, Pan, Zijie, Liu, Yu, Jiang, Yushan, Zhang, James Y., Rasul, Kashif, Schneider, Anderson, Ma, Lintao, Nevmyvaka, Yuriy, Song, Dongjin
Recently, Large Language Models (LLMs) and Foundation Models (FMs) have become prevalent for time series forecasting tasks. However, fine-tuning large language models (LLMs) for forecasting enables the adaptation to specific domains but may not generalize well across diverse, unseen datasets. Meanwhile, existing time series foundation models (TSFMs) lack inherent mechanisms for domain adaptation and suffer from limited interpretability, making them suboptimal for zero-shot forecasting. To this end, we present TS-RAG, a retrieval-augmented generation based time series forecasting framework that enhances the generalization capability and interpretability of TSFMs. Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant time series segments from a dedicated knowledge database, incorporating contextual patterns for the given time series query. Next, we develop a learnable Mixture-of-Experts (MoE)-based augmentation module, which dynamically fuses retrieved time series patterns with the TSFM's representation of the input query, improving forecasting accuracy without requiring task-specific fine-tuning. Thorough empirical studies on seven public benchmark datasets demonstrate that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming TSFMs by up to 6.51% across diverse domains and showcasing desired interpretability.
HyperGCL: Multi-Modal Graph Contrastive Learning via Learnable Hypergraph Views
Saifuddin, Khaled Mohammed, Ji, Shihao, Akbas, Esra
--Recent advancements in Graph Contrastive Learning (GCL) have demonstrated remarkable effectiveness in improving graph representations. However, relying on predefined augmentations (e.g., node dropping, edge perturbation, attribute masking) may result in the loss of task-relevant information and a lack of adaptability to diverse input data. Furthermore, the selection of negative samples remains rarely explored. In this paper, we introduce HyperGCL, a novel multimodal GCL framework from a hypergraph perspective. HyperGCL constructs three distinct hypergraph views by jointly utilizing the input graph's structure and attributes, enabling a comprehensive integration of multiple modalities in contrastive learning. A learnable adaptive topology augmentation technique enhances these views by preserving important relations and filtering out noise. View-specific encoders capture essential characteristics from each view, while a network-aware contrastive loss leverages the underlying topology to define positive and negative samples effectively. Extensive experiments on benchmark datasets demonstrate that HyperGCL achieves state-of-the-art node classification performance. Building on the success of contrastive learning (CL) in computer vision and natural language processing [1], [2], CL approaches have been extended to graph data--known as Graph Contrastive Learning (GCL)--where Graph Neural Networks (GNNs) learn robust representations by maximizing agreement between augmented graph views [3]-[6]. First, they often depend on handcrafted augmentations such as node dropping, edge perturbation, and attribute masking.